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Complex systems are characterized by a huge number of degrees of freedom often interacting 
in a non-linear manner. In many cases macroscopic states, however, can be characterized by a 
small number of order parameters that obey stochastic dynamics in time. Recently techniques for 
the estimation of the corresponding stochastic differential equations from measured data have been 
introduced. This contribution develops a framework for the estimation of the functions and their 
respective (Bayesian posterior) confidence regions based on likelihood estimators. In succession 
approximations are introduced that significantly improve the efficiency of the estimation procedure. 
While being consistent with standard approaches to the problem this contribution solves important 
problems concerning the applicability and the accuracy of estimated parameters. 
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I. INTRODUCTION 



Complex systems are characterized by a huge number of degrees of freedom. In presence of nonUnear interactions 
such systems can form macroscopically ordered states characterized by a rather smaU number of order parameters that 
dominate the dynamics at large time scale [l[ . The microscopic degrees of freedom force the system stochastically at 
small time scales which can have a significant impact on the dynamics, in particular far from equilibrium. A natural 
framework for modelling the dynamics of complex systems are stochastic differential equations of the formQ 



d_ 

dt" 



-x{t) = 7^(1) t) + ^Di^){x,t)V{t) (1) 

characterized by the drift and diffusion functions D'^^\x,t) and D^'^\x^t). Here T{t) is a Gaussian white noise source 
with correlation function (T{t)T{t')) = 2S{t — t'). The term y^D^'^^x,t)T{t) is interpreted according to Ito's definition 
of stochastic integro-differential equations Drift and diffusion account for the deterministic and the stochastic 
contributions respectively and are directly linked to the deterministic dynamics of order parameters and the character 
of the intrinsic dynamical noise. Often these functions cannot be derived explicitly since the interactions of the 
microscopic degrees of freedom are unknown. 

As recently demonstrated drift and diffusion D^^\x,t) and D^^'>{x,t) can directly be estimated from ensembles of 
time series x{t) measured on the respective systems by the definition 0, [l] 

L'(")(a;',t):= lim^((x(i + T)-a-(t))"|a;(t)=2:') , n = l,2 . (2) 

-r->0 n'.T 

Here x{t) = x' indicates, that only increments close to x' are considered for averaging, where the interpretation of 
close depends on the kernel or the binning used for estimation Q. If the process can assumed to be stationary, i.e. 
D^^^ and D^'^^ do not depend on time explicitly, ensemble averages correspond to time averages and these expressions 
can be calculated from a single time series. A qualitative estimation of drift and diffusion functions from ^ e.g. 
by evaluation of the expression for the smallest available time increment t is straightforward and has very low 
computational demands. This approach in the foUowingwill be termed direct estimation. It successfully has been 
applied to different systems ranging from turbulent flows [3, 0] and wind power converters Q to financial markets [l^l , 
traffic flow data [lH , and medical applications [H, [l^ • I^or a recent comprehensive review both on the estimation 
procedure and on applications the reader is referred to . 

If quantitative reliable estimates of drift and diffusion are required the explanatory power of the direct estimation 
is limited, since it is sensitive to the sampling frequency, the estimation kernel used, the amount of data used for 
estimation, and systematic deviations e.g. due to measurement noise. In the course of time many of these drawbacks 
have been addressed and solved, e.g. through corrections of the original estimation procedure for finite sampling 
frequencies (l5l - [l8| , through new approaches explicitly circumventing the limit of high sampling frequencies |l9l - [2l| , 
and through explicitly considering the presence of measurement noise in the estimation procedure |22h24| | . As a matter 
of fact many of these advancements, however, significantly complicate the estimation. 

This work exhibits an alternative approach to the topic, since it focuses on joint probability distributions of observed 
data and aims to find the model underlying with maximum likelihood. Likelihood estimators for the estimation of 
drift and diffusion functions already were investigated intensively in the past decade, in particular in the mathematics 
community and with respect to financial applic ations ^25. 27] . Recently, a couple a contributions emphasized their 
applicability in a more general context |28l430| . One particular advantage of the maximum likelihood approach, 
however, not sufficiently discussed so far is the fact that the results easily can be interpreted in a quantitative 
manner since confidence regions for parameters can be calculated for any parameter estimated. The aims of this work 
are twofold. At first I give an comprehensive introduction into the application of maximum likelihood estimators. 
Under certain circumstances that will be discussed sequentially the estimation procedure drastically can be reduced. 
Methods straightforwardly applicable in the respective cases are discussed in detail; examples for their numerical 
implementation are included in the appendix. For each of the procedures described, at second, methods for evaluating 
the accuracy of the estimates are described. 

We in section |TT] exemplary start from a rather simple example, an Ornstcin-Uhlenbcck process with known finite 
time propagator. On this example the concepts of maximum likelihood estimation of parameters and of Bayesian 
posterior confidence intervals for the individual parameters (section IIIip are introduced. In principle this procedure 
also is applicable to the estimation of general parametrized drift and diffusion functions [3, [3l| , though at rather 
high computational costs. The effort, however, reduces significantly for measurements available at high frequencies, 
as discussed in sections HVllVIl 

At first in section IIVI an approximation to the small-time propagator is implemented in the estimation procedure, 
which makes it applicable to more complex problems at reasonable effort. It is shown, that the estimation results are 
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satisfactory even at small but finite time lags. If drift and diffusion in addition are assumed to be smooth functions 
this approach can be extended to a procedure for the non-parametric estimation of drift and diffusion, as outlined in 
section IVl The results are demonstrated to be consistent with the direct estimation procedure, ([2]). In section IVTl this 
procedure is developed further and applied to the optimization of parametric functions for drift and diffusion, which 
results in a tremendous gain in numerical efficiency with respect to previous approaches. The method developed in 
this section exhibits the main contribution of this work. It impresses through its accuracy even at moderate amount 
of data and since its implementation neither involves numerical integration nor other computationally demanding 
operations it is straightforward to be applied even to large data sets. 

This work aims to introduce a new approach to the estimation of drift and diffusion and to discuss it in detail. In 
order to be able to focus on the problems intrinsic to the estimation procedure we work on synthetic data generated 
numerically from a stochastic differential equation of the form ([l}. That is, we at this stage explicitly exclude problems 
arising from models not conform with the data e.g. due to measurement noise. The impact of such effects needs to 
be investigated separately in future. 



II. ESTIMATION OF PARAMETRIC DRIFT AND DIFFUSION FUNCTIONS AT ARBITRARY TIME 

INCREMENT 

This section aims to introduce the basic methodology of maximum likelihood estimation of parametric drift and 
diffusion functions. 

Let us assume that we have measurements xo{tQ), xi{ti), . . .XN(t]\[) on a Markovian system at times ti = to + it 
and let us assume that the dynamics is stationary in time. If we furthermore have a model for the propagator at 
the time interval r in form of the conditional probability density function p(xi\xi-i \ f2, r), that is consistent with the 
data set and that depends on set of parameters Jl e 7^ C R™: How should we determine the unknown parameters fJ, 
that best match the measurements? 

A natural approach to this problem is to consider the probability of a certain given the realized measurements. 
In general this probability can be calculated using Bayes theorem, 

N Px{xN,---,Xl\Xf)]n,T)fn{^) 

Here, fci and fx are prior distributions of ft and the measurements x. For reasons of clarity we from now on drop 
the subscripts of probability density functions if unambiguously defined through their arguments. If we do not have 
particular information on the prior distribution it is reasonable to assume that priors arc uniformly distributed on 
appropriate intervals. Then (jSj yields 

p{ft\xN,---,Xo;T) p{xN,---,Xi\xo;ft,T) , (4) 

implying, that the probability of a certain set of parameters $7 is proportional to the joint probability of the realisation 
of the measurement data under the respective model. The idea of the maximum likelihood approach is to determine 
a set of parameters fl G V maximising expression Q . 

Since the logarithm is a strictly monotonous function maximising the likelihood of ft is equivalent to maximizing 
log [p{n\xNj • ■ • , a^o)]- Since we have a Markov process the conditional pdf p{xn, . . . , xijxo; fl) degenerates and we 
can define 

N 

L(f2) :=^log[p(x,|x,_i;n,T)] , (5) 

i=l 

which we call the log-likelihood function and which we aim to maximize with respect to fl. 

One model, on which the optimisation process can be demonstrated exemplarily and which will act as a baseline 
case in this manuscript is the Ornstcin-Uhlcnbeck process with linear repulsive drift Z?^^^(a;) = —7a; and constant 
diffusion D^^^[x) = Q. For the Ornstcin-Uhlcnbeck process the explicit form of the finite time propagator is 



p{x.^,\x,,{l,Q),r) = J 2,g(iZ,-2..) - ^Q-^l.e^V) 1 ' 



In general the parameters {j,Q) with maximum likelihood can now be obtained through numerical optimization of 
the log- likelihood function ([S|), which is quite efficient in cases where the propagator is known analytically. For the 
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10 -1.700 (-43.784,40.289)0.949 0.509 (0.228, 1.516)o.878 

100 2.734 (-1.874,7.344)0.950 0.974 (0.746, 1.305)o.945 

1000 1.444 (0.357,2.530)0.950 0.955 (0.875, 1.043)o.950 

10000 1.193 (0.889,1.497)0.950 1.007 (0.980, 1.035)o.946 

100000 1.051 (0.960,1.141)0.952 1.001 (0.992, 1.010)o.95i 

1000000 1.000 (0.972,1.028)0.955 1.000 (0.997, 1.003)o.962 



TABLE I. Parameters 7 and Q estimated from realisations of an Ornstein-Uhlenbeck by evaluation of (O for different sample 
sizes N. The brackets indicate Bayesian posterior confidence intervals (BpCIs) based on the log-hkelihood offset i?w(0.95) 
obtained from Wilks' theorem, (|lip . Subscripts to the brackets mark the realized confidence levels as calculated from equation 
(|10b|) . The data set was generated numerically by sampling with 7 = 1, Q = 1 and r = 0.01. With increasing TV the 
estimate (7, Q) converges to these parameter values. BpCIs accurately seem to model the statistical uncertainties, since the 
true values used for simulation always are within the confidence interval. The numerically calculated confidence levels in all 
but one cases are very close to the desired value of 0.95 suggesting that the log-likelihood offset calculated from Wilks' theorem 
is accurate enough for the present purpose. 



Ornstein-Uhlenbeck process the maximum can be derived analytically. The parameters maximizing the likelihood 
function only depend on certain statistical properties of the measured sample, 



■ log 



and Q 



7 { IXi - Xi-ie 



1-e 



(7) 



Table H] shows that (7, Q) converges to the correct (7, Q) for — > 00 as the statistics converge with increasing sample 
size. 



III. BAYESIAN POSTERIOR CONFIDENCE INTERVALS (BPCIS) AND WILKS' THEOREM 

With the maximum likelihood approach it is straightforward to investigate the accuracy of the estimated parameters. 
For this purpose we define Bayesian posterior confidence regions (BpCR) C € V for the estimates as compact sets 
around the optimal parameters ft that contain the true parameter with a certain probability. In practise the confidence 
level often is set to = 95%, which also is used throughout this work. 

BpCRs differ subtly from classical confidence regions as typically defined in statistics (cCR): They define sets which 
contain the true parameter with a certain probability ly given that the priors were chosen appropriately and that the 
data is consistent with the general model. One could argue that this is the best one can do from a single measurement 
on a system. In contrast cCRs consists of all parameters for that the observed realisation would belong to the ^100% 
most probable realisations. The differences between these definitions and their respective advantages intensively have 
been debated since a long time already (see e.g. [32|) and we are not intending to join this debate here. The most 
important point perhaps is, that the deviations between the definitions are rather small if the number of observations 
is sufficiently large [33, as we also will see on a practical example in section [V] And that BpCIs from the technical 
point of view have significant advantages, since they easily can be determined numerically. 

The boundaries of BpCRs still allow for some freedom. The standard definition requires to choose the boundaries 
corresponding to the smallest BpCRs [s^. A technically more convenient alternative in connection with likelihood 
estimators is to consider a threshold r of the likelihood ratio cxp[_L(r2)]/exp[L(f2)], which is bound from above by 1 at 
rj = rj. The BpCRs associated with r then are defined to extend to all parameters fl with exp[L(Jl)]/cxp[L(17)] > r 
in the compact set containing Ct. This definition of BpCRs that will be applied throughout this work. Parameters 
at the boundaries of C have in common that they are realized with a probability which is only a fraction r of the 
probability for the most likely set of parameters. Since it often is advantageous to work with log-likelihoods, a more 
practicable form of the set C is 

c = e r\L{n) > L{Ci) + Rj (8) 

with R := log(r). The boundaries of C can be iterated numerically by root finding algorithms. 

So far R was not directly related to a certain confidence level. If the integral J^^-p d™ri' exp[L(r2')] exists the 
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confidence level, however, is a function of R that can be calculated as 

_ /„,,d"'17- exp[L(»0] 
"^""^ Z^.^d'-f^' cxp[L(f2')] ■ 

In principle i? now could be iterated until the desired confidence level for the maximum likelihood estimate is achieved. 
The corresponding region C then with probability v would contain the correct estimate of the parameters. 

Instead of considering sets specifying entire regions in parameter space typically confidence intervals in the individual 
components of are derived independently, since these errors are much easier to discuss. We again stick to the 
Bayesian framework and now define Bayesian posterior confidence intervals (BpCIs). Formally these intervals can be 
obtained by restricting the evaluation of equations (|8]) and ([9|) to a certain subset of V only. For component z of fJ, 
Hi G Vi; this corresponds to the respective expressions 

C,(i?,) = {j7, e7',|L(f^i,...,f^,_i,f7,,f},+i,...,f}„)>L(fi) + i?,^ , (10a) 

i'i(Ri) = — — = = = . 10b 

d^ exp[L(r!i, . . . , ^, . . . , f}„0] 

Since we are now working in one dimension only the numerical evaluation of the expressions is straightforward. An 
alternative to iteration of equations (jlOp for the determination of the desired Ri is the application of Wilks' theorem, 
which explicitly provides -R(i') in the asymptotic limit of large test statistics [s^l- For one degree of freedom (which 
is the single component under consideration in equations ()10p ) the asymptotic results are 



Rwiiy) --^ -If-,\i.,1) , (11) 

where F~2^{v,l) is the inverse cumulative distribution function of the distribution with one degree of freedom 
evaluated at ly. For the 95%-significance level one obtains i?w(0.95) = —1.92073. 

For the Ornstein-Uhlenbeck process investigated in section |ll] BpCIs for the parameters 7 and Q were calculated 
by numerical evaluation of the interval boundaries in (jlOap for the log- likelihood offset i?w(0.95). For verification 
of the confidence level subsequently (jlObp was evaluated. The results are listed in table U as a function of sample 
size. Obviously BpCIs for the Ornstein-Uhlenbeck process accurately reflect the statistical uncertainties even at small 
sample sizes. The intervals in all cases include the true parameters used for numerical generation of the sample. The 
accuracy of Wilks' theorem in this example is satisfactory for samples of size N = 100 and larger. For this reason the 
computationally rather expensive iteration of equations (jlO|) might not be needed in many cases of practical relevance. 
For small samples validation of the confidence level with (|10b[) . however, is recommended if accuracy is crucial. 

From the example discussed in section |lT] it became evident, that calculation of maximum likelihood estimates is 
straightforward if the propagator is known explicitly. In addition BpCRs or BpCIs can be iterated numerically at a 
desired confidence level, which exhibits the main advantage of this approach. 

In principal this procedure is also applicable to the estimation of general drift and diffusion functions, since the 
propagator can be calculated for finite times by (numerical) solution of the Fokker-Planck equation 0, [25|, [3l| . Since 
calculations in this case are computationally very demanding for large data sets data aggregation e.g. by approximation 
of the fine-grained probability distribution function is recommended, which reduces the number of function evaluations 
required and transforms the problem into minimizing the KuUback-Leiblcr distance between the observed and modelled 
propagators (l9l [3l| . In this sense maximum likelihood estimation is equivalent to the maximum entropy principle 
introduced by Jaynes' HI]. Accurate estimates of BpCIs and confidence levels, however, in the general case barely 
can be obtained at reasonable numerical effort. For this reason the remaining part of the manuscript is devoted to 
approximations generally applicable in case of high sampling frequencies and smooth drift and diffusion functions, 
which drastically improve the numerical efficiency of the estimation process. 



IV. ESTIMATION OF PARAMETRIC DRIFT AND DIFFUSION FUNCTIONS AT SMALL TIME 

INCREMENT 



The direct estimation procedure briefly described in section U requires evaluation of moments in the limit r — )■ 0, 
i.e. it requires measurement data to be available at high sampling frequency. Also the maximum likelihood estimation 
considerably can benefit from high sampling frequencies. 
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The main drawback of a maximum likelihood estimation of general drift and diffusion functions is, that for the 
propagator in the log-likelihood function ^ is not available analytically but from numerical solution of the Fokker- 
Planck equation only. At sufficiently small time increments this propagator, however, can be approximated by 



p{xi+i\xi;U,T) 



cxp 



[Xr+1 



4:D(^'>{xi,n)T 



(12) 



This expression directly can be used to perform a maximum likelihood estimation with Baycsian posterior confidence 
intervals for the respective parameters as introduced in great detail in the previous section, but now for general 
parametrizations of drift and diffusion functions. Since (jl2[) is only valid at small r systematic errors, however, are 
expected if data is not available at sufficiently high sampling frequencies. 

As an example for the application of the estimation procedure we come back to an Ornstein-Uhlenbeck process. 
Samples of length N = 100000 were simulated with D^^\x) = —x and D'^'^\x) = 1 at different time increments r 
between 0.001 and 1.000. On each of the time series the parametrized drift and diffusion functions 



(13a) 
(13b) 



D'^^Xx, Vt) = UiX + n2X^ + fJgX^ 

D'^'^\x,n) = ni + n^x^ 

were optimized with respect to f2 by maximizing the log-likelihood function ([5]) with the small-r propagator ([2 
BpCIs were calculated at the 95%-level based on the log-likelihood offset i?w(0.95). An example of the numerical 
implementation of the estimation procedure is the procedure parMLE listed in appendix [B] Estimation results are 
compiled in table |lll 

The estimation procedure involved the optimization of five parameters of the low order polynomials used for Z)*-^^ 
and D'^'^\ The result of the estimation procedure was not sensitive to the starting values (for the actual calculations 
the starting value fi^ = 1 was used for all i). At the two highest sampling rates r = 0.001 and r = 0.01 the 
estimation procedure with the approximated propagator produces accurate results. This demonstrates, that the 
small-r approximation actually is applicable for data analysis purposes at sufficiently large sampling frequencies. For 
lower sampling frequencies (i.e. larger t) the estimates for ili and deviate from the true parameters since the 
finite-T approximation of the propagator is not any more valid to a sufficient extent. These deviations are systematic 
and therefore not reflected by the BpCIs. 

The BpCIs for the parameters 2, 3 and 5 always contain the value suggesting correctly that the respective coefficients 
in the drift and diffusion functions might not be required. In many applications this is a valuable information since 
more appropriate estimators might be applicable if the model can be reduced. 



V. ESTIMATION AT SMALL TIME INCREMENT AND STEPWISE CONSTANT DRIFT AND 
DIFFUSION: A NON-PARAMETRIC PROCEDURE 



In the preceding section an approximation applicable at high sampling frequencies was introduced. The estimation 
becomes even more efficient if we, in addition, approximate the resulting drift and diffusion by piecewise constant 
functions. Then maximum- likelihood estimation even is feasible for the non-parametric estimation of drift and diffusion 
functions as it will be demonstrated in this section. 

For application of the direct estimation the data typically is partitioned into a set of bins. Let us now also introduce 
a number of B non-overlapping bins, that cover the entire measurement region. Each bin i is associated with certain 
subset Bi of the measurement space implying that each measurement on the systems can be assigned to exactly one 
bin. Hence we can introduce non-overlapping sets of indexes 

A^ = {j e {0, . . . , TV - 1} \x, e B,} for i = 1, ...,B (14) 

containing the indexes of all measurements within the respective bins. In addition each bin i is assigned a position 
Xi by the centre of mass of the respective set Bi. 

We now assume, that drift and diffusion are constant within the individual bins, i.e. D'^^'>{x) = D^^'' and D^'^\x) = 

(2) 

Dl yx ^ Bi. These stepwise constant parts of the drift and diffusion functions constitute the parameters we aim 
to optimize, i.e. we have f2 = {^D^^\ . . . , D^g' ,D^^\ . . . , D^^^^ . If the data is sampled at high frequency the small-r 
approximation of the propagator, p2p . can be used and the log- likelihood function ([5]) degenerates into contributions 



r 




Qi 




^2 


^3 




^5 


0.001 


-1.23 (- 


-1.54, 


-0.93) 


-0.01 (-0.19,0.18) 


0.02 (-0.07,0.11) 


1.00 (1.01, 1.02) 


0.00 (-0.01,0.00) 


0.010 


-0.95 


-1.04, 


-0.86) 


0.04 (-0.02,0.09) 


-0.04 (-0.06,-0.01) 


0.99 (0.98, 1.00) 


0.00 (0.00,0.01) 


0.100 


-0.95 


-0.98, 


-0.92) 


0.01 (-0.01,0.02) 


0.00 (-0.01,0.00) 


0.91 (0.90,0.92) 


0.00 (-0.01,0.00) 


1.000 


-0.63 (- 


-0.63, 


-0.62) 


0.00 (0.00,0.01) 


0.00 (0.00,0.00) 


0.43 (0.43,0.44) 


0.00 (0.00,0.00) 



TABLE II. Parameters Qi to 0,5 of the drift and diffusion functions (|13p estimated from realisations of an Ornstein-Uhlenbeck process by maximization of ([5]) with 
a small-r approximation of the propagator, (|12|l . The brackets indicate Bayesian posterior confidence intervals (BpCIs) at confidence level u = 0.95, where the 
log-likelihood offset was calculated from Wilks' theorem. The estimation procedure was applied to samples of length A'' = 100000 generated at different time lag r from 
an Ornstein-Uhlenbeck process with Q± = —1, ^4 = 1, and $12 = ^3 = ^5= 0. 
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from the individuals bins, 



Xj+i - Xj - D 



(1), 



1 



log(47ri:>f' r 



B 

E 



(1) ^(2) 



(15a) 



(15b) 



Since the summands Li depend on -Dp' and -Dp' in the respective bins only, their contributions are independent 
of one another. Maximizing the joint likelihood function hence is equivalent to maximizing the contributions of the 
individual bins. This consequence of the finite-r approximation is what makes the maximum likelihood approach 
feasible for the non-parametric estimation of stepwise-constant drift and diffusion functions. 
If we introduce frequencies n,; := J2j£A ^ ^^'^ conditional moments 



.(1) 



— {xj+i — Xj) and m^-^^ 



1 



(16) 



the contributions of the individual bins are represented through 



L,iD. 



(1) r)(2)l 



log (47rL>fV 



(17) 



For calculation of the log-likelihoods, hence, not the entire data set is required. Instead the conditional moments 
rrSj^^ and rn!-p and the frequencies Ui are sufficient, which need to be calculated only once. Subsequent evaluation of 
the log- likelihood function then is computationally very efficient and independent of the actual size of the data set. 
As in section [TT] maximisation of p7l) can be performed analytically and yields 



and bf^^^im?^^ 
2t 



(18) 



Apparently the maximum likelihood estimate for the drift term is equal to the respective result from the direct 
estimation procedure The same holds for the diffusion term in the limit t ^ 0. At finite r the diffusion 

~ (2) 

estimate for bin i obtained from the direct estimation procedure differs from the maximum likelihood estimate 



by hip. 



(1)n2 



) . Interestingly this term has been discussed as a potential finite-r correction of the estimation procedure 



Beyond the calculation of the most likely parameters, likelihood functions also are required for the calculation of 
BpCIs. Details on how this procedure can be performed for the particular functions (jl7p are outlined in appendix 
\X[ together with remarks on the calculation of the respective cCIs for the direct estimation procedure. Both the 
maximum likelihood and the direct estimation procedures were applied to one realisation of an Ornstein-Uhlenbeck 
process with N = 10000 sampled at r = 0.01. The estimation results are depicted in figured] For the non-parametric 
maximum likelihood estimation (two upper rows) two different realisation of BpCIs are shown, one with constant 

(2) (2) 

Rl = i?w(0.95) (upper row) and one where the individual R] were iterated to achieve a constant significance level 

of v = 0.95 (second row). Figure [5] demonstrates the effect of the iteration of Rl on the confidence levels of BpCIs 
for the diffusion estimate. 

Results of the direct estimation procedure for the diffusion differ slightly from the maximum likelihood estimates 
psp . These differences, however, only seem to be relevant at the sparsely sampled boundaries, where the term [TZ)j^']^ 
can become large. In the central regions with a sufficient amount of samples per bin differences in the respective 
confidence intervals are relatively small. Here BpCIs correspond very well to cCIs, a fact already mentioned in section 
mil In the sparsely sampled regions cCIs of D*^^^ generally are larger than the respective BpCIs, which is a result 
of the deviation of Student's t-distribution from normality. For the diffusion estimates the respective confidence 
intervals in the sparse regions differ quite significantly: The BpCIs with constant significance level are largest and, 
therefore, probably most conservative. Comparison between the uppermost and the second row and inspection of 
figure [5] suggests that Wilks' theorem might not safely be applicable in sparsely sampled regions. All confidence 
intervals almost everywhere contain the true parameter values indicated by the dashed lines. 

Generally the differences between the maximum likelihood estimates and the direct estimation procedure are almost 
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FIG. 1. Comparison of the non-parametric estimates for drift (left column) and diffusion (right column) functions obtained 
from maximum likelihood estimation as described in section |V] (two upper rows) with the results from the direct estimation 
(lowest row). All analyses were performed on a sample of length A*' = 10000 of an Ornstein-Uhlenbeck process with 7 = 1 and 
Q = 1 sampled at time increment r = 0.01. The corresponding analytical drift and diffusion functions are indicated by the 
dashed lines. For analysis the data interval was partitioned into B = 100 bins of equal size. The points indicate the respective 
estimates and the shades parameter regions outside the Bayesian posterior confidence intervals (for the maximum likelihood 
estimates, upper two rows) and the classical confidence interval (for the direct estimates, lower row) as described in appendix 
1X1 each at 95% level. For the uppermost row the offsets of the log-likelihoods were calculated from Wilks' theorem, whereas 
the offsets for the second row were iterated until the desired confidence was reached. Results of the iteration procedure are 
detailed in figure (2] Overall the procedures yield similar results, also with respect to the confidence intervals. Iteration of the 
log-likelihood offset (innermost row) results in increased Bayesian posterior confidence intervals for the diffusion estimates, in 
particular in sparsely sampled regions. 



negligible and the potentially small improvement in the confidence intervals for D*^^) in many cases hardly may justify 
the application of this more complex estimation procedure. The non-parametric estimation generally suffers from 
huge uncertainties in sparsely sampled regions. For this reason a quantitative interpretation of the estimates often is 
difhcult. 

If a quantitative characterisation of the system is required one possibility is to fit parametrized drift and diffusion 
functions to the non-parametric estimates. From inspection of figure [1] we can understand that this procedure might 
not be straightforward, since one typically is faced with several questions: which regions should be fitted? How should 
one account for the confidence intervals, in particular for the asymmetric ones for the diffusion estimate? In particular 
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FIG. 2. Significance levels of Bayesian posterior confidence intervals for the non-parametric maximum likelihood estimates 
of the diffusion functions shown in figure [T] The data is described in detail in the caption of figure [1] The points indicate 
significance levels calculated from (|A2|l for confidence intervals determined numerically for fixed i?^^' = i?w(0.95) (red dots, 
corresponding to the upper row in figure [l]) and for offsets i?'^' iterated individually for each bin with the aim to achieve the 
desired significance level v = 0.95 (blue triangles, second row of figure [1]). Obviously iteration of i?^^' in the individual bins in 
successful in achieving confidence intervals at a certain significance level (here: 0.95). 



the latter question is not as trivial as it seems. 

If we for convenience assume that the confidence intervals for Z?'^^ are symmetrical and that they correspond to 
quantiles of a normal distribution standard procedures can be used to fit the parametrizations (jl3p of drift and 
diffusion to the direct estimates for drift and diffusion exhibited in the lower panels of figure [TJ Two least square 
fits were performed, one fitting the estimates only (NP-DE) and another weighting the estimated by the reciprocal 
squared width of the respective confidence intervals (NP-DE-CI) . The results are compiled in table IIIII and compared 
to the results from maximum likelihood estimation of the parameters as described in the previous section (MLE). 
The accuracy of the parameter estimates NP-DE and NP-DE-CI actually is fine, having in mind that cCIs were 
not considered for the estimation (NP-DE) or assumed to be symmetric (NP-DE-CI). Anyway the precision of the 
estimates and the respective confidence intervals seems to be deficient with respect to the estimation of parameters 
by maximum likelihood methods. 



VI. ESTIMATION AT SMALL TIME INCREMENT AND STEPWISE CONSTANT DRIFT AND 
DIFFUSION: AN EFFICIENT PARAMETRIC PROCEDURE 

The estimation procedure for parametric drift and diffusion as described in section IIVI yields accurate estimation 
results and Bayesian posterior confidence regions if measurement data is available at high sampling frequencies. 
A drawback of this method, however, is that the calculation of the likelihood functions becomes computationally 
demanding for large data sets or a complex parameter space. This disadvantage can be solved by combining it with 
the concept of stepwise constant drift and diffusion functions developed in the previous section. 

Let us assume that we have partitioned the data into B non-overlapping bins, calculated the respective centres 
of mass Xi^ the index sets Ai from (jl4p . the frequencies n^, and the conditional moments according to (|16p . For 

a given combination of drift and diffusion coefficients in the individual bins, ^_D^^\ . . . , I?^'', ZJp'', . . . , Dg'^ , the 

log-likelihood function then can be calculated as 

l{d^\...M^^D^^\...M^) =f:L.(z?«,i?f)) , (19) 

1=1 

where the contributions of the individual bins are given by (|17p . If we have a parametric form of D^^^ and D^^^ 
depending on a set of parameters Jl such as e.g. (|13p the coefficients in the individual bins for a particular $7 can be 
defined as 

^= (X„ O) and D^^ D^'^\X,,n) \li , (20) 
where the functions are evaluated at the centre of mass of the respective bins. Then, equation (|19p can be used for 



Method 




Oi 




^2 




^3 


^^4 






Time 


MLE 


-0.98 


(-1.28,-0.69) 


0.05 (- 


-0.13,0.24) 


-0.05 (- 


-0.15,0.03) 


0.97 (0.95, 1.00) 


0.01 (- 


-0.01,0.03) 


364.7 s 


NP-DE 


-0.58 


(-1.87,0.71) 


0.16 ( 


-0.07,0.39) 


-0.15 (- 


-0.38,0.08) 


0.92 (0.76, 1.08) 


0.05 (- 


-0.01,0.09) 


0.2 s 


NP-DE-CI 


-0.69 


(-1.14, -0.24) 


-0.05 ( 


-0.23,0.13) 


-0.19 (- 


-0.29,-0.09) 


1.02 (0.99, 1.05) 


-0.10 (- 


-0.10,-0.09) 


0.2s 


BIN-MLE 


-1.01 


(-1.31, -0.72) 


0.07 (- 


-0.12,0.26) 


-0.04 (- 


-0.14,0.05) 


0.97 (0.95, 1.00) 


0.01 (- 


-0.01,0.03) 


5.0s 



TABLE III. Parameters Cli to CI5 of the drift and diffusion functions (|13|l estimated from realisations of an Ornstein-Uhlenbeck with Qi = —1, Q4, = 1, and 
Q2 ~ O3 ~ O5 — 0. All estimates are based on the same data set of length A'' — 10000 sampled numerically at r = 0.01. The estimates in the first row, MLE, are 
obtained from maximization of the likelihood function ([5} with the finite-r approximation (|12[l . The second and third rows contain estimates obtained from fitting of 
the direct estimates for drift and diffusion as plotted in the lower panels of figure [1] with the respective polynomials. The fits for NP-DE rely only on the estimated 
values, whereas NP-DE-CI takes into account the respective (classical) confidence intervals by weighting the estimates with the reciprocal of their squared width. The 
results in the last line, BIN-MLE, are obtained from maximization of the likelihood function (|21|) as outlined in section IVII The column Time lists the computational 
time required for calculation of the respective estimates. Examples for the numerical implementation of the three estimation procedures used for generation of this 
benchmark are provided in appendix IbI 
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calculation of the log-likelihood of ft, 

B 

L{n) = J2L^[D^'Hx^,n),D^^^x,,n)) . (21) 

j=i 

This expression needs to be maximized with respect to fl. An example for the numerical implementation procedure 
is included in appendix IB] (procedure parBinMLE). Results for optimizing the drift and diffusion functions (|13|) for 
synthetic data obtained from sampling an Ornstein-Uhlenbeck process are listed in table IIIII (ease BIN-MLE, last 
row). 

Optimising expression (|2ip has advantages with respect to the procedures for parametric estimation outlined in 
sections IIVI and |Vl With respect to the optimisation of the exact small-r log- likelihood function, ^ with propagator 
P^ . the computationally effort is reduced drastically, in particular for large data sets. Once pre-calculation of the 
conditional moments is completed the effort for evaluating the log-likelihood does not depend on the dimension of 
the data set but of the number of bins only. This is an important improvement, in particular if the dimension m of 
Jl becomes large and complex maximization algorithms are applied. The significant gain in computational efficiency 
already has an effect at data sets consisting of = 10000 points, where the computational effort roughly is decreased 
by the order 10^ (rightmost column of table UlI)) . 

With respect to the procedure outlined in the previous section, i.e. calculation of a non-parametric estimate and 
fitting the respective functions, maximizing the log-likelihood yields much more precise results (table HIH) . Moreover 
the calculation of BpCIs is a consistent procedure, whereas fitting of D^^' with its asymmetric confidence intervals 
cannot be performed with many standard tools. Finally maximization of (j20|) in contrast to procedures for the non- 
parametric estimation is not very sensitive to the binning, since (|2ip for bins of equal size converges to ([5]) with 
propagator p2p in the limit S — > oo. The estimation results listed in table Hill indicate that the procedure for the 
sample data set already is sufficiently accurate with B = 100. 

VII. CONCLUSIONS 

This work addresses the estimation of drift and diffusion functions from time series data by means of maximum 
likelihood estimators with a particular focus on the statistical accuracy of results through the calculation of confidence 
intervals for the estimated parameters. The results are discussed in the light of a procedure for the direct estimation 
of drift and diffusion function, ([3]), which extensively has been used in several applications. 

From the cases studied here we can conclude that maximum likelihood estimators arc superior to the direct esti- 
mation procedure if quantitative results are required. However for a long time their application for the estimation of 
general drift and diffusion functions hardly was possible at reasonable effort, since it intensively involved numerical 
solutions of the Fokker-Planck equation (sections lU and HlI)) . Within the scope of this work more efficient methods 
for the estimation of parametric drift and diffusion functions were developed. 

In section IIVI an approximated log-likelihood function for the estimation from data sampled at high frequencies 
is introduced, which renders numerical solution of the Fokker-Planck equation unnecessary. Although estimators 
in this limit already were described earlier psj their performance, at least to my knowledge, was not investigated 
systematically. From our investigations we see that the estimator yields good results even at small but finite time 
lags f table HI)). The numerical efficiency drastically can be increased if the data, in addition, is partitioned into bins 
as described in section IVTl The application of this estimator e.g. for fitting drift and diffusion functions by low order 
polynomials such as (|13p is only slightly more complex than the direct estimation procedure. For this reason I would 
like to recommend this procedure for the quantitative estimation of parametric drift and diffusion functions from data 
available at sufficiently high sampling frequency. 

An advantage of the maximum likelihood approach is, that it is straightforward to define Bayesian posterior con- 
fidence intervals (BpCIs) for any parameter estimated. To my knowledge this problem had not been addressed in 
connection with the estimation of drift and diffusion functions, although of very high relevance for the quantitative 
discussion of results. The calculation of BpCIs is explained in great detail in section IIIII and demonstrated by the 
respective examples in sections |lT] to IVll 

In section |V] the non-parametric estimation of drift and diffusion by maximum-likelihood methods is addressed and 
discussed in the context of the direct estimation procedure, A similar procedure also is investigated in a recent 
arXiv preprint by Ohkubo which uses kernel density estimators for the transition probability density functions j29| . 
In contrast we apply a partition of the data into non-overlapping bins, which is more straightforward to implement 
and which results in a simple connection between global and local likelihood functions, ((TS|) . a fact that is essential 
for the efficient parametric estimation procedure developed subsequently. 

It is shown that the direct estimation slightly differs from the most likely estimates only in the diffusion term. 
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Interestingly these differences correspond to finite-r correction for the direct estimation procedure proposed in |16j 
that was debated intensively [l3,[lll- Results derived in section |V] and in appendix |X] also are relevant for application 
of the direct estimation procedure, which until now lacked a detailed discussion of the statistical errors. It is expected 
that the resemblance to the maximum likelihood estimates and the assumption required for this purpose (i.e. the 
small-T approximation and the assumption of stepwise constant drift and diffusion functions, see section|V]for details) 
also might ease the interpretation of direct estimates of drift and diffusion. 

A great deal of the methods developed in this work relies on the small-r approximation of the propagator, (|12p . 
which only is applicable at sufficiently large sampling frequencies. It is difficult to give an exact and general criterion 
which frequencies are required in each case. As a rule of thumb the sampling, however, needs to allow both for 
resolving the microscopic fluctuations at small time scales and a sufficiently large fraction of phase space. For cases 
where the accuracy of the small-r approximation is questionable it is recommended to validate the results by methods 
not explicitly incorporating this approximation as e.g. described in section HIl and in Isij. 

At several instances the efficacy of the approximates for the log-likelihood functions and the straightforward im- 
plementation of the estimation procedure were emphasized. The computational costs of the different approaches are 
compared in table Hill and show, that in particular the computational effort for the procedure outlined in section IVll is 
very low, in spite of the optimization of a five-dimensional parameter vector. The Python codes used for benchmark- 
ing including the implementation of the respective estimation procedures are listed in appendix [Bj I hope that these 
examples give ideas about potential implementations of the estimation procedure and stimulate the application of 
maximum likelihood estimators and Bayesian posterior confidence intervals for the analysis of relevant measurements. 
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APPENDIX 



Appendix A: Confidence intervals for the non-parametric estimates 



This appendix addresses the calculation of Bayesian posterior confidence intervals (BpCIs) for the non-parametric 
maximum likelihood estimates and classical confidence intervals (cCIs) for the direct estimates, ([2]). It uses nomencla- 
ture and variables as introduced in section [V] The respective confidence intervals were used for preparation of figure 

m 

For the individual bins BpCIs for drift and diffusion can be calculated by the method introduced in section IIIII For 
the drift coefficients the expressions can be evaluated analytically. With help of some algebra it can be shown that 
the log-likclihood offset calculated from Wilks' theorem independently of the sample size provides BpCIs exactly at 



the desired confidence level. For the drift in bin i the BpCI at at level is C. 



(1) 



with 



'^r^ = \/-^A^'^V2crf-iM , (Al) 

where evi~^{v) is the inverse of the error function at the desired confidence level v. This interval is consistent with 
classical confidence intervals for the mean of samples with known variance (which is a consequence of the fact that 
we investigate estimation errors in and D^^^ independent from one another as described in section llll| . For the 
95% confidence level one obtains V2erf"^(0.95) = 1.95996. 

As already previously noted for cCIs of direct estimates [s^ the size of BpCIs for the drift estimates diverges in the 
limit r — )• 0, which renders the fact that the estimation problem for the drift is ill-posed [s^l- Throughout this work 
we, however, could observe, that estimation at small but finite r yields satisfactory results and that the respective 
confidence intervals accurately reflected the statistical uncertainties of the drift estimates relevant at small time lags. 
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BpCIs of the diffusion estimates to our knowledge cannot be derived analytically but need to be iterated numerically 
from equation (|10a|) . For > 2 at least a closed expression of the confidence level lyl 'iR) of the BpCI [a,{R),b,{R)] 
associated to a certain R can be calculated by inserting (fTT]) into ()10bp . 

where (x, k) is the cumulative distribution function of the distribution at x with k degrees of freedom. This 
expression can either be used for verification of the confidence level for a constant R or for iteration of R (and 
boundaries of the corresponding confidence interval) until a desired confidence level is reached. For bins i with < 3 
BpCIs for the diffusion estimate cannot be calculated since (jlObp diverges. 

For the direct estimates, classical confidence intervals (cCIs) at confidence level v can be calculated from 
quantiles of Student's t and the distribution, respectively. For the drift estimates the confidence intervals are 



symmetric with respect to the estimated drift coefficients. For bin i the cCI at level v is 
with 



(J 



where Ff^{x, i) is the inverse cumulative distribution function of Student's t-distribution with i degrees of freedom, 
evaluated at x. For the significance level v = 0.95 relevant in this manuscript for large sample sizes -F^"^ (0.975, i — >■ 
oo) = 1.95996 is obtained, which equals the respective value for the BpCIs for the drift estimates calculated above. 

cCIs of the diffusion estimates are asymmetric. The boundaries [a^, bi] of the cCIs for the diffusion estimate in bin 
i at level ly can be calculated as 

-t(dI^A +ni \ ^ ' \ and (A4a) 



2 



V 2 



DT') , (A4b) 



where F^2^{x,i) is the inverse cumulative distribution function of the distribution with i degrees of freedom 

evaluated at x. Quantiles of distribution functions such as Student's t or the distributions in the meanwhile 
accurately are iterated numerically by standard packages as e.g. applied in the Python codes used for preparation of 
this manuscript as listed in appendix |B] 



Appendix B: Python code: Implementation of estimation procedures and benchmark 

This appendix contains listings of Python code used for preparation of the tables and figures contained in this 
manuscript. Python is available without charge for most software environments. The codes were tested with version 
2.6.5. 

The file python_procedures.py contains examples for implementations of the parametric estimation procedures de- 
scribed in sections IIVI (parMLE) and IVII (parBinMLE). In addition an implementation of the direct estimation 
procedure for the non-parametric estimation is included [uonParDirect), that yields classical confidence intervals for 
the estimates calculated according to appendix 1X1 

The file python_main.py exhibits an example of how the individual procedures can be used on a specific data set. 
This code has been used for calculation of the results and benchmarks listed in table IIIII 



MLE_estimation_procedures.py 



import scipy 






import scipy . optimize 






import scipy. stats 






import scipy . special 






import numpy 






# parMLE - PARAMETRIC MLE WITH FINITE TIME 
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 


APPROXIMATION 


(SECT. 4) 


def parNcgLogLikGlihood(pars ,dat,dGltat ,dl 


d2 ) : 




#Log- likelihood function , dependent on 


functions dl . d2 and data 




rcs=[] 
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for i in rangG(lon(dat)— 1): 








var=2.*d2(pars ,dat[i])*dcltat 








if var > 0: 








res . appcnd( scipy . log (2. * scipy . pi*var)/24- 








^dat ( i-f-lj' — (^clat [ ij-|-cil^para ,clat [ i JJ* dcltat ) ) -m -m z j ^^.*var ) ) 








else : 








res . append ( flo at ( " Inf " ) ) 








return faum( rot, ) 








parOonlFkt^conl , i , sigRatio ,dat , dcltat ,dl ,d2,paropt ,opt): 








#Roots at boundaries of the Id confidence interval of parameter i 








dev— numpy . zeros ( len (paropt )) 








dev [ i] = conf 








return (opt— parNcgLogLik;cliliood(paropt + dev,dat , dcltat ,dl,d2) 








-scipy . log ( sigRatio )) 




d 


ef 


parMLE (dat , dcltat ,dl ,d2,par0 , dcsiredSig ,xtol ): 








r e s=:numpy . zeros ([ len (parO ) ,3] ) 








#sig —numpy . zeros ( len ( parO ) ) 








sigRatio = scipy .cxp( — scipy . stats . clii2 .ppf(desircdSig ,l)/2) 








# F in d optimal parameters with simplex gradient method 








print " parMLE : _Finding_bGst_parameters : " 








res [: ,0] = scipy . optimize . fmin(parNcgLogLikclihood , parO , 








args = [dat,deltat ,dl,d2] ,xtol=xtol .disp = Fals 








opt = parNegLogLikeIihood (res[: ,0] , dat, dcltat ,dl,d2) 








#C alculate width of condidence intervals 








for i in range (len (parO ) ) : 








print Parameter , i +1 : " , r es [ i ,0] , " , _BPCI : _ (" , 








#find lower bound for bisection root finding algorithm 








res [i ,1]=-1. 








while parConfFkt(rcs [i ,1] ,i , sigRatio ,dat , dcltat ,dl,d2, res [: ,0] ,opt)>=0: 






res [i ,l] = rcs [i ,1] -1. 








#bisection 








res [1 j-i-J' — scipy . opxi m ize- Disecx^^parv^ontrKt ,res[i jij tU, 








,args = (i , sigRatio, dat, deltat ,dl,d2,res[: ,0] ,opt),xtol=lc — 


-3) 






^ f i n d upper bound for bisection root finding algorithm 








re> [i ,2] = 1. 








w iilje pari^oniJrKt^res [i ,i ,sig ri. atio .aat .Qcitat ,cii,a.z,res [: rUJ . o ) j> \.i . 








res [i ,2J — res |i ,2J + 1. 








#bisection 








res[i,2] = scipy. optimize. biscct(parConfFkt ,0. ,res[i ,2] 








args — (i sigRatio dat deltat dl d2 res[' 0] opt) xtol — le — 








prinx res[iTUj'i~res[i jJ-j j , ,res[i,uj'-t"rcs[i i-sj j } 




if- 

# 


pa 


.rBinMLE — PARAMETRIC MLE , FINITE TIME APPR. AND SPATIAL DISCRET . (SECT. 


e) 


d 


ef 


oinNegLogLikclihood (pars ,incmcan,incmcansq ,n, deltat ): 








T^IiO g — likelihood function for bin — wise estimation based on statistical pars 








i f ( pars [1] > ) : 








vai:=2.*pars [1] * deltat 








res =n / 2*( scipy . log (2.* scipy . pii-var ) 








-|-(incmeansq — 2.*pars [0]* dcltat 'i'incmean-|-(pars [0])*'t'2* deltat 'i''i'2)/var 


) 














res = f loat ( " Inf" ) 








return res 






ef 


parBinNcgLogLikcliliood(pars ,incmcan , incmcansq ,incn ,xCenter ,dl ,d2 , deltat ) : 








# R o o ts at boundaries of the Id confidence interval of parameter i 








res = [] 








for i in range ( len (incn )) : 








res.appcnd(binNcgLogLikcliliood ([dl(pars ,xCcnter[i]) ,d2(pars , 








xCenter[i])] ,inemean[i] ,inemeansq[i] ,inen[i] , deltat)) 








return sum(res) 








parBinConfFkt(conf , i , sigRatio , incmean , incmcansq , incn, xCentcr, deltat , 








dl , d2 , paropt , opt ) : 








#Roots at boundaries of the Id confidence interval of parameter i 








de v=:numpy . zeros ( len (paropt )) 








dev [ i]-^conf 








return (opt— parBinNegLogLikelihood( paropt+dev , incmean , incmcansq , incn , 








xCenter ,dl ,d2 , dcltat)— scipy . log (sigRatio 


)) 






parBinMLE( dat , dcltat ,nbin ,dl,d2,par0 , desiredSig ,xtol ): 








r c s=numpy . zeros ([ len (parO ) ,3] ) 








sigRatio = scipy .cxp(— scipy . stats . chi2 .ppf( desiredSig ,l)/2) 








#init binning 
xmax=numpy . max (dat) 








xmin=numpy . min ( dat ) 








deltax = (xmax-xmin) /nbin 








X L e f t=numpy . array (range (nbin))* dcltax -|-xm i n 








xCenter— xLeft + deltax /2 








i nc me an=numpy .zeros (nbin) 








incmeans q— numpy . zeros (nbin) 








i n c n=numpy . zeros (nbin ,dty p e=numpy . i n t ) 








^calculate statistics of bins 








for j in range (0 , nbin ) : 








#s el e ct relevant data 








j D at=numpy .where ((dat [: len (dat)— 1] 








<xLeft [j]+dBltax)&(dat [: len(dat)-l]>=ixLeft [ 


j D) 






incn [ j]-len (jDat [0] ) 








inc=dat [jDat [0] + l] — dat [jDat [0] ] 








incmean [j] = snm(inc)/inen [j ] 








incmcansq [j]=sum( i n c * * 2 ) / i n c n [ j ] 








# F ind optimal parameters with simplex gradient method 








print " parBinMLE : _ F i nd i n g _ b e s t _p ar ame t e r s :" 








res [: ,0] = scipy . optimize . fmin(parBinNegLogLikelihood , parO , 








args=[incmean,incmeansq ,incn ,xCenter,dl,d2, deltat] ,xtol = xtol ,disp = Fals 


o) 






opt = parBinNegLogLikelihood (res [: ,0] , incmean, incmcansq ,incn , 








xCentcr , dl , d2 , delta 


t ) 






# Calculate width of condidence intervals 








for i in range ( len (parO )) ; 








print " Parameter_" , i + 1 , " : " , r e s [ i , ] , " , _BPCI : _(" , 
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#find lower bound for bisection root finding algorithm 
res [i ,1]=-1. 

while parBinConfFkt(i:os[i ,1] ,i,sigRatio .incmcan, incmcansq , 

incn,xCGntcr.doltat ,dl,d2,res[: ,0] ,opt)>=0: 

res [i ,l] = rcs [i ,1] -1. 

#bisection 

res [i ,l] = scipy . optimize, bisect ( p ar B i n C o nf F kt ,res[i ,1] ,0. , 
args=(i .sigRatio ,incmcan, incmcansq , 

incn,xCcntcr,dcltat ,dl,d2,rcs[: ,0] ,opt),xtol=lc-3) 

# f i n d upper bound for bisection root finding algorithm 
res [i ,2] = 1. 

while parBinConfFkt( res [ i ,2] , i , sigRatio , incmcan , incmcansq , 

incn.xCcntcr.dcltat ,dl,d2,i:es[: ,0] ,opt)>0: 

res [i ,2] = rcs [i ,2] + l. 
#b, section 

res [i ,2] = scipy . optimize, bisect ( p ar B i n C o nf F kt ,0.,rcs[i,2], 
args=(i , sigRatio , incmcan, incmcansq , 

incn,xCcntcr,dcltat .dl,d2,rcs[: .0] ,opt),xtol=lc-3) 
print res[i,0]+rcs[i,l],S',rcs[i,0]+rGs[i,2],')' 

# nonPar Direct - DIRECT EST. (FINITE TIME APP . AND SPATIAL DISCRET ) (APPENDIX A) 
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

def nonParDirect(dat , deltat ,nbin , significanceLevel ): 

# Direct estimation of drift and diffusion in nbin bins from time series 

res =numpy . zeros ([nbin ,2 ,3]) 

i n c n=numpy . zeros (nbin ,dty p c=numpy . i n t ) 
#b,nn,ng 

xmax— numpy . max( dat ) 
xm in=iiumpy . min ( dat ) 
deltax = (xiiiax-xmin)/ nbin 

X L e f t =numpy . array (range (nbin))* dcltax +xm i n 
xCenter=xLBft + deltax /2 

#pr o c es 3 individual bins sequentially 

print 'parBinMLE:_ProcGssing_' ,nbin , '_bins_SGquGntially : ' 
for j in range (0 , nbin ) : 

# s ele ct relevant data 

j D a t =numpy .where ((dat [: len(dat) — 1] 

<xLeft [j] + deltax)&(dat [: len(dat)-l]> = xLeft [j ])) 

inen [ j]=len (jDat [0] ) 

if incn [j] >1: 

print j , xCenter [)],'_', 

^calculate increments 

— dat [jDat [0] + l] — dat [jDat [0] ] 
n,ean==um{inc)/i„cn [j] 
meanBq=3um( inc..2)/inen[j] 
nieanquad=sum( ine**4)/inen[j] 



#£>!rect est 
rea [ j ,0 ,0] = 
res [j ,1,0]= 



1 o/ dl and d2 
r / de 1 1 a t 
nsq/deltat /2 



^Standard errors for dl 

ro3[j,0,2] = (aeipy.EitatEi.t.ppt((l.+ 3igniticancGLevel)/2,inen[j]-l) 

.Hcipy . Hqrt ( ( i nc me ana q -i n c m e an . . 2 ) / i n c n [j])/doltat) 
res [j ,0,l]=-res [j ,0 ,2] 

#Standard errors for d2 

rea[j,l,l] = deltat/2..rGa[j,0,0]»»2-reEi[j,l,0]+l.*inen[j].( 
rea[j,l,0]-deltat/2..roa[j,0,0]««2)/( 

aeipy.atata.ehi2.ppt((l.+ HignitieancoLevel)/2,l..(inen[j]-l))) 
rea[j,l,2]=doltat/2..rea[j,0,0]»»2~res[j,l,0]+l.«inen(j]»( 
rea[j,l,0]-deltat/2..rea[j,0,0]».2)/( 

aeipy.atata.chi2.ppt((l.~aignifieaneeLevel)/2,l..(incn[j]-l))) 
print 'Dl:_',roH[j,0,0],', _CI : _( ' , 

print res[j,0,0] + roa[j,0,l],',-,reB[j,0,0] + rea[j,0,l],')-', 
print •D2:_',res[j,l,0],' , _CI : _( ' , 

print res [j ,1 ,0] + roa [j ,1 ,1] , ' , ' ,reB [j ,l,0] + rea [j ,1 ,2] , ')_' 



main.py 

#=== PARAMETERS FOR SIMULATION AND ANALYSIS ^^^=^===============^ 

# 

# SIMULATION OF OU PROCESS 

gamma= 1 . 
d=1.0 

deltat =0.01 

dim=10000 

# 

# MLE ESTIMATION 

atartingParametera=[l. ,1. ,1. ,1. .1.] 

doairedSig=0.95 

# 

# PARAMETRISED FUNCTIONS 

#- DRIFT: 

def dl (para , x ) : 

return para [0] • x+para [l]»x**2 + para [2]«x«»3 

# 

#- DIFFUSION : 
def d2 (para , x ) : 

return p a r a [ 3 ] + p a r a [ 4 ] * x * * 2 

# 



w- 




import random 




import Bcipy 




import scipy . optimize 




import numpy 




import time 




from MLE_cstimation_proccdures 
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#A) ^^^^^^ SIMULATION OF DATA SET ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
print '"Generating _OU-data_ ..." 

x=numpy . array ( numpy . zeros (dim)) 
sfact = scipy . cxp( — gamma* d c 1 1 a t ) 

sdGV = scipy . sqrt ( d /gamma* (1.— scipy . cxp(— 2. * gamma* d ol t a t ) ) ) 

# SAMPLING OF THE FINITE TIME PROPAGATOR 
for i in range ( 1 , dim ) : 

X [ i ] = random . normalvariate(x[i - l]*sfact , sdcv) 

#B ) ^^^^^^ DATA ANALYSIS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
print " Analysing_data_ . . . " 
t=time . time ( ) 

# PARAMETRIC MLE WITH FINITE TIME APPROXIMATION 

parMLE (x, deltat ,dl,d2, startingParameters .desiredSig ,xtol=lc— 8) 
print "Timc_elapsGd : ,timc . time() — t , " s" 
t=timc . time { ) 

# DIRECT ESTIMATION PROCEDURE ACCORDING TO SIEGERT . FRIEDRICH . PEINKE ET AL . 
nonParDirect{x, deltat ,100 , dcsiredSig) 

print " Time_elap SG d : , t i m c . t i m e { ) — t , " s " 
t=time . time ( } 

# PARAMETRIC MLE WITH FINITE TIME APPROXIMATION AND SPATIAL DISCRETIZATION ( BINS ) 
parBinMLE{x , deltat , 100, dl,d2, startingParameters ,dGsiredSig ,xtol=le-8) 

print "Time_elapsGd : , time . time() — t , " s" 
t=time . time () 
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